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We report on a lattice based algorithm, completely vectorized for molecular dynamics 
simulations. Its algorithmic complexity is of the order O(N), where N is the number 
of particles. The algorithm works very effectively when the particles have short range 
interaction, but it is applicable to each kind of interaction. The code was tested on a 
Cray ymp el in a simulation of flowing granular material. 
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1. Introduction 

The properties of systems consisting of many interacting particles have been subjects 
of interest to scientists for hundreds of years. Since the equations of motion do not 
allow for an analytical solutionEl one has to calculate the trajectories of the particles 
by integrating Newtons equation of motion numerically for each particle i: 

?i = ^~-F i {r j ) (j = l,...,N) (1) 

This method is called molecular dynamics simulation. Molecular dynamics simula- 
tions have led to many important results, especially in fluid dynamics, solid state 
physics, polymer physics, and plasma physics. 

There are many methods to integrate sets of ordinary differential equations J3 
One of the most common is the GEAR-predictor-corrector method§ which we used 
in our simulations (see Appendix). 

The main problem in molecular dynamics is the calculation of the forces acting 
upon each of the particles. The term Fi(fj) may be very difficult to determine, but 
even if it is not too difficult and the forces depend only on the pairwise distances of 
the particles, the algorithmic complexity is at least of the order 0(N 2 ) since each 
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particle may interact with each other. In fact the probability that two particles 
which are a large distance apart at time t, and interact within a small time interval 
(t, t + At) (At is much larger than the integration step St) is very low, due to 
the probability distribution of the particle's velocities. That means that even if 
there is a long range force as the electrostatic force there is a critical threshold 
for the distance beyond which the particles do not influence each other. Therefore 
one can neglect the interaction of those particles during some time interval At 
and one can use lists of neighbors j for each particle i containing the particles j 
which are closer to particle i than a given thresholds The neighborhood-list has 
to be updated each time interval At. The asymptotic time-complexity, however, 
is not reduced because the calculation of the neighborhood list is still of the order 
0(N 2 ). Nevertheless the calculation time reduces drastically. Unfortunately in the 
general case there is no simple way to vectorize a molecular dynamics code using 
neighborhood lists, e.g. Ref. 5. There are hierarchical force calculation algorithms 
of complexity 0(Nlog(N)) (Ref. 6), and even of complexity O(N) (Ref. 7). These 
algorithms make use of fast multipole expansions, which cannot be applied to each 
type of particle interaction, as for the case of short range interaction, in particular if 
the force is not a steady function of the distance between two interacting particles. 

The aim of this paper is to describe a completely vcctorizable algorithm for 
molecular dynamics, which acts very effectively, provided the following precondi- 
tions hold: 

1. There is no long range interaction between the particles. 

2. The particles are allowed to deform only slightly. 

3. The dispersion of the particle size is not too large. 

4. The particle density, i.e. the number of particles per space unit is high. 

These preconditions are provided in the molecular dynamics simulations of granular 
material, as we will demonstrate below, which are of special interest. 

2. Description of the Algorithm 

Initially, we assume a region, where particles are allowed to move. One can 
also think about a periodically continued region to simulate periodic boundary 
conditions. Now we define a square lattice which covers this region and its cell size 
is determined by the requirement that not more than one particle can reside per 
lattice cell. Obviously the cell size d is determined by the smallest particles and 
by the smallest distance d m i n they can have during the simulation d = d m i n /\/2 
(fig# 

Given this lattice our algorithm acts as follows: 

1. Set the variables to their initial values. 

2. Predict the positions and the time derivatives due to the predictor-corrector 
algorithm as described in the Appendix. 
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Figure 1: The size of the lattice cell is determined by the smallest particles and 
their minimum distance during the simulation since there must not be more than 
one particle residing within each lattice cell at time t. 



3. Set up the lattice vectors: 



-j—j — J particle number if there resides a particle , , 

' J 1 otherwise 

i e [1, LMAX] 

t= — Tr., _ J 1 if there is a particle , . 

[0 otherwise 

i e [l, LMAX], 
where LMAX is the number of lattice cells. 



The vector Index maps all observables which belong to the particles to cor- 
responding lattice vectors, so that we get a set of vectors x, y, F x , F y , etc. 

Calculate the forces acting upon the particles. 

Because the particles are assumed to have short range interactions we only 
regard the interactions of a particle in cell with particles which reside 
within cells (k,l),(k€ [i-2,i+2],l € \j-2,j+2],(k,l)=£(i,j)). Therefore we 
may define a mask (fig. ||) which describes the range of particle interactions. 
To determine the interaction of each particle with the others we move the 
centre of the mask through all sites of the lattice and calculate the interactions 
of the particles which might lie within the mask. The index i, i £ [1,24], 
points to the mask positions which interacts with the regarded cell. Hence 
the distances as well as all interesting values of the particles may be expressed 
by 24 vectors indexed by the numbers of the lattice cells. 



disU = \/Ax % Ax % + Ay i Ay i i e [1, 24] (4) 

with 



Axi = x — x{i} (5) 



&y t = y-y{i}, (6) 



where the vector x{i} equals the vector x shifted due to the position of i 
according to the mask enumeration as given in fig. |2[ 
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Figure 2: Mask describing the area in the neighborhood of a particle, which resides 
in lattice cell 0. 



Now wc define a set of vector variables contacti i £ [1,24] which checks 
whether or not the particles touch each other. 

1 if the particles touch each other 
otherwise (7) 

ie [1,24]. 



contacti 



With these variables the normal and shear forces read as: 



Test ■ Test{i} ■ contact, ■ J"„(0, i) (8) 



F si = Test ■ Test{i} ■ contacti ■ F s (0, i), (9) 

where !F n (0,i) and F s (0,i) denote the normal and shear forces between par- 
ticles which possibly reside in mask position and i (i £ [1, 24]). 
The total forces and the momentum acting upon each particle are given by: 

24 1 

F x = ]T ==- • (Ax t ■ F ni + Ay, ■ F SI ) (10) 

2 = 1 * 

24 1 

F y = ^ = ■ (Ay< • F ni - Axi ■ F si ) (11) 
i=l disti 

24 

M = -Y j r-F- Sl . (12) 



5. Correct the predicted values due to the GEAR-algorithm as described in the 
Appendix. 

6. Increase the time t := t + St. 
Proceed with step 2. 

Assuming that the particles do not penetrate each other more than 10% of their 
radii, this algorithm works correctly if the ratio of the smallest and the largest radii 
does not exceed 1 : The cell size d has to be set to the value of the maximum 
radius (d = r max ). 

At each time-step one can extract all physical observables of interest. 



3. Efficiency of the Algorithm 

To evaluate the efficiency of the algorithm it is necessary to determine the num- 
ber of operations NOP of each step described in Section 2 and the corresponding 
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vector length n. Thereby we disregard the possibility of chaining. Given a pipeline 
depth D, one elementary vector operation requires a time of n + D clock periods 
instead of n ■ D which is the required time to process the equivalent code on a scalar 
machine. 

For each part of the algorithm we get the results summarized in the following 
table. 



Part of algorithm 


NOP 


n 


estimated time 


Predictor 


59 


NP 


59 • (NP + D) 


Index 


12 


NP 


12 ■ (NP + D) 


Forces 


1656 


LM AX 


1656 • (LMAX + D) 


Corrector 


54 


NP 


54 • (NP + D) 



NP number of particles 



LMAX number of lattice cells 

One iteration step of the algorithm needs the time: 

f = 125 ■ (NP + D) + 1656 • (LMAX + D) clock periods. 

That means the time depends strongly on the number of lattice cells but hardly 
on the number of particles. Fig. || shows that the time of the vectorized algorithm 
does not vary as rapidly with increasing number of particles as the time of the 
scalar algorithm. Figure ^ shows the calculation time (normalized by the number 
of particles) as a function of the number of particles for different particle densities 
P = lai ax ■ ^ ne re q mre d time for the neighborhood-list method does not depend 
on the particle density, because it works without the lattice. 

Figure 3: The processing time for vectorized calculations depends very weakly on 
the number of particles while the time for the scalar neighborhood-list method rises 
as N 2 . The pipeline depth D was assumed to be 20. 



Figure 4: Processing time as a function of the number of particles for different 
particle densities compared to the neighborhood-list method (dashed line). 

The density is obviously the limiting factor for the efficiency of the algorithm. 
To be more effective than the neighborhood-list method, the number of particles 
must increase with the reciprocal of the parameter p. Therefore the algorithm is 
constrained to problems with high particle density or to very large systems with a 
lower density. 

4. MD-Simulation of granular material using the new algorithm 

The algorithm described above was intended to simulate the flow of granular 
materials like dry sand. Moving sand reveals very astonishing effects. When flu- 
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idized by shaking, it can behave like a fluid, while at rest it mostly behaves like 
a solid. Many experiments and computer simulations of moving sand have been 
performed 

Using the proposed algorithm we investigated the flow of granular material 
through a vertical long narrow pipe with diameter d and length I, (I 3> d) and 
periodic boundary conditions. We want to describe our numerical simulations here 
only as an example to explain how the algorithm works and do not want to dis- 
cuss the numerical results in detail which can be found in Ref. 9. To simulate a 
rough surface, the pipe was built of slightly smaller particles which interact with 
the freely moving grains in the same manner as the freely moving grains act on each 
other (fig. ||). For this reason we need not distinguish between the interaction of 

Figure 5: The narrow pipe has a diameter d where only a few particles are allowed 
to fit. Its surface consists of slightly smaller particles to simulate a rough surface. 



the grains with each other and between the grains and the wall. The particle radii 
Ri have been chosen from a GAUSSian distribution with mean value Rq. For the 
force weassumed the Ansatz given by Cundall and Strackll3 and slightly modified 
by HaflS: 



4 = / F »-W%\ + F S'(°i ~o)'W^\ if|r--f J |<^+^ (13) 

otherwise 



with 



and 



where 



F N = k N - (Ri + Rj - \n - fjl) 1 ' 5 + 7A r • M eff ■ (fi - fj) (14) 

F s = min{-7s • M eff ■ v re i , A* ■ |-Fjv|}, (15) 

Vrel = {n - fj) + Ri-fli- Rj ■ Qj) (16) 

Mi ■ Mi , . 

M ef f = 3 -. 17 

u Mi + Mj 1 ; 



The terms fi, fi, Cli and Mj denote the current position, velocity, angular velocity 
and mass of the i th particle. The model includes an elastic restoration force which 
corresponds to the microscopic assumption that the particles can slightly deform 
each other. In order to mimic three-dimensional behaviour the HERTZian contact 
forcetS which increases with the power | was applied. The other terms describe 
the energy dissipation of the system due to collisions between particles according 
to normal and shear friction. The parameters 77V and 75 stand for the normal 
and shear friction coefficients. Eq. [ll] takes into account that the particles will not 
transfer rotational energy but slide on top of each other if the relative velocity at the 
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contact point exceeds a certain value which depends on the normal component of 
the force acting between the particles (COULOMB relationEj). For the parameters 
we have chosen 7A r = 1000 s~ l , 7s = 300 s~ l , k N = 100 N/m 1 - 5 and fi = 0.5. 
The system showed accurate numerical behaviour when the integration time step 
St = 10~ 4 s was used. 

After starting the simulation with a homogeneous particle distribution, one hnds 
the surprising result that the particles begin to form regions of different density 
which can move in both directions. These density waves are of no definite wave- 
length. The distances between the density maxima vary irregularly with time and 
depend strongly on the initial conditions. Figure 6 shows the evolution of the pipe. 
The pipe is drawn every 500 time steps, and the initial conditions are shown at the 
bottom. After detecting the density waves in our numerical simulations they were 
also found experimentally. For details see Ref. 9. 

Figure 6: The simulated flow of granular material through a pipe is shown in a 
sequence of snapshots. The snapshot on the bottom (t—0) shows the homogeneous 
particle distribution at the beginning of the simulation. The pipe is plotted every 
0.05 seconds starting at time t=10 sec. The density profile varies irregularly with 
time and depends essentially on the initial conditions. Gravity acts from left to 
right. 



5. Conclusion 

In conclusion we state that there is a lattice based algorithm of complexity O(N) 
which can be vectorized completely and which acts very effectively given certain 
preconditions, such as high particle density, low dispersion of the particle size and 
short range interactions between the particles. The algorithm was compared in 
detail with the neighborhood-list method. For the case of higher particle dispersion, 
a similar algorithm can be defined using a mask of size 7x7 instead of 5 x 5. The 
algorithm was applied in a molecular dynamics simulation of granular material, and 
there were no differences in the results compared to equivalent simulations using 
the classical neighborhood-list method. 
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Appendix A 

The algorithm of a Gear predictor-corrector method reads as follows: 

1. Predict the positions of the particles r? r (t + St) and the time derivatives 

* pr •• pr 

f i (t + St) , r i (t + St) , ■ ■ ■ up to the desired order of accuracy by Taylor 
expansion using the known values at time t: 



rf r (t + St) 

K pr (t + st) 

r? (t + St) 



n(t) + St ■ h{t) + \St 2 ■ fi(t) + i<5t 3 • f t (3) (t) + ■■■ 

^(t) + st-h(t) + ±st 2 ■? l (3 \t) + --- 

?,(t) + St ■ r t {3) (*) + ■■■ 



** COW 

2. Calculate the accelerations f i (t + St) acting upon the particles using the 
predicted values. 



3. Correct the predicted values using r i (t + St) 
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■■ corr , 

Ti (t + St) 



/ r.r rr (t + 5t) \ /r? r (t + 5t)\ ( Cf) \ 

■corr , , IJPT , „ . 

^ (t + (ft) f- (* + 5t) Ci 

V i / V / V J 

where c$ are constant values depending on the desired order of accuracy. 



Proceed with the first step with incremented time t := t + St. 



